Classification using support vector machines and variables selection

ABSTRACT

A method of deriving a classifier for classifying items using a plurality of variables for characteristics of the items, the method comprising determining a representative subset of the variables for use in said classifier.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a national phase filing under 35 U.S.C. §371 ofInternational application number PCT/IB2006/002398, filed Jul. 28, 2006.

The invention relates to improvements in classification. The inventionis especially applicable to classification of items of currency orvalue, such as banknotes or coins.

The invention is related to our co-pending applications, EP 1 217 589 Aand EP 1 516 293 A, the contents of which are incorporated herein byreference.

In the following, SVM stands for Support vector machine, FVS stands forFeature Vector Selection (see EP 1 516 293 A mentioned above) and theterm billway stands for the feeding of a bill in a validator in oneorientation. Thus, there are 4 billways per denominations.

In a bill acceptor comprising transport means, sensor means, memory andprocessing means, scanning the document with many sensors and manywavelengths results in a large volume of multivariate data stored in amemory that can be used to discriminate and authenticate the documents,and to denominate a bill in the case of a banknote acceptor. It isdesirable for this process to be fast and it is known that a subset ofthe data can be sufficient to achieve a better result. This applicationrelates to finding an optimal subset of variables representing the data.

Aspects of the invention are set out in the accompanying claims.

This new denomination algorithm deals with the selection of the inputdata that are given to a SVM (Support Vector Machine) algorithm.

The purpose of the variable selection is to eliminate irrelevant or lessrelevant variables for classification purposes and at the same time tokeep a high separation performance. The idea is to find a set ofvariables that are highly correlated with the projected data on theseparation vector obtained with the SVM when trained with all variables.

The process can be used to generate data inputs for the bill validatorto use in order to denominate bills from a pre-defined bill setcomprising classes. The classes can be the 4 bill ways of a singledenomination and/or other denominations. The selected variables anddiscriminant axis generated by the process may be loaded in the memoryof the validator. They are used by the validator to later validate a newsample presented to the unit as being a member of one of the classes.

Although the embodiments of the invention are described in the contextof denominating bills, the process can be broadly applied to any problemof variable selection, including for example the authentication problemin the context of bill validation.

Embodiments of the invention will be described with reference to theaccompanying drawings of which:

FIG. 1 is a schematic diagram of a banknote sensing system;

FIG. 2 is a plan view from above of the sensor array of the sensingsystem of FIG. 1;

FIG. 3 is a plan view from below of the light source array of thesensing system of FIG. 1;

FIG. 4 is graph illustrating classification.

A banknote sensing system according to an embodiment of the invention isshown schematically in FIG. 1. The system includes a light source array2 arranged on one side of a banknote transport path, and a light sensorarray 4 arranged on the other side of the banknote transport path,opposite the light source array 2. The system includes banknotetransport means in the form of four sets of rollers 6 for transporting abanknote 8 along the transport path between the light source array 2 andthe light sensor array 4. The light source array 4 is connected to aprocessor 10 and the system is controlled by a controller 12. A diffuser14 for diffusing and mixing light emitted from the light source array 2is arranged between the light source array 2 and the banknote transportpath.

FIG. 2 is a plan view from below of the light source array 2. As shown,the light source array is a linear array of a plurality of light sources9. The array is arranged in groups 11 of six sources, and each source ina group emits light of a different wavelength, which are chosen assuitable for the application, usually varieties of blue and red. Aplurality of such groups 11 are arranged linearly across the transportpath, so that light sources for each wavelength are arranged across thetransport path.

FIG. 3 is a plan view from above of the light sensor array 4. As shown,the light sensor array includes eight circular light sensors arranged ina line across the transport path. The sensors are 7 mm in diameter andthe centres are spaced 7 mm apart in a line, so that the sensors areside by side.

FIGS. 2 and 3 are not to scale, and the light source and light sensorarrays are approximately the same size.

In operation, a banknote is transported by the rollers 6, under controlof the controller 12, along the transport path between the source andsensor arrays 2, 4. The banknote is transported by a predetermineddistance then stopped. All the light sources of one wavelength areoperated and, after mixing of the light in the diffuser 14 to spread ituniformly over the width of the banknote, the light impinges on thebanknote. Light transmitted through the banknote is sensed by the sensorarray 4, and signals are derived from the sensors for each measurementspot on the banknote corresponding to each sensor. Similarly, the lightsources of all the other wavelengths are similarly operated insuccession, with measurements being derived for the sensors for eachwavelength, for the corresponding line.

Next, the rollers 6 are activated to move the banknote again by thepredetermined distance and the sequence of illuminating the banknote andtaking measurements for each wavelength for each sensor is repeated.

By repeating the above steps across the length of the banknote, line byline, measurements are derived for each of the six wavelengths for eachsensor for each line of the banknote, determined by the predetermineddistance by which the banknote is moved.

As mentioned above, the document is scanned by a linear sensor arraythat measures spots on the document with a pre-defined scanningresolution in width and length and for multiple wavelengths.

Because the documents can have different dimensions, a full scan cantechnically yield data sets of different sizes. A discriminationalgorithm works by comparing objects using measurement vectors that haveto be of a common size. As in practice the documents can have differentsize, if only for tolerance reasons, a scanning maximum area that iscommon to all sizes to denominate is defined. This common size will varywith the bill sets targeted.

By design an SVM classifier works with 2 classes. The discrimination andthe variable selection is performed using 2 classes, the first class isthe reference class and the second class contains all the bills otherthan class 1. In the case of denomination and for practical purpose, thedocuments used in class 2 are close to those of class 1 according totheir dimension. It means that they pass the length and width test ofclass 1 (it is otherwise trivial to separate documents of differentdimensions). Further details of SVM classifiers can be found in priorart literature.

Test samples for the reference class and second class are measured usinga sensor array as described above, and the measured values are processedas described below, to generate a SVM with a reduced set of variables.

Let X be the set of data X=x_(ij), i=1. . . M, j=1. . . N, where M isthe number of bills and N is the number of variables. A variable is agiven spot location and a given wavelength. In other words for each billi all the variables (length×width×wavelength) in one vector x_(ij), j=1.. . N. For a given bill i the variable j is submitted to a normalisationby subtracting the mean of four tracks in the associated wavelength.

In more detail, the spots used for normalisation are defined by fourtracks. The mean of the four tracks is computed for each wavelength andstored in a vector {m₁,m₂,m₃,m₄,m₅,m₆}, for example, for sixwavelengths. Then each measurement x_(ij) of the associated wavelength kis normalized by subtracting m_(k).

For convenience, the whole set of data is normalized before applying thevariable selection algorithm.

The normalisation above removes from each variable the global effect ofthe bill such as the paper or dust or aging. On the other hand, thescaling process deals with the dispersion of the whole set of bills. Theset of data X is composed of two sets: data related to the referenceclass class1 and those related to class2: X=(X¹,X²). The two sets ofdata are scaled using the mean and the standard deviation of class 1.The scaling is given by:

$\begin{matrix}{{X^{1} = \frac{{X^{1} - {{mean}\left( X^{1} \right)}}}{{std}\left( X^{1} \right)}}{X^{2} = \frac{{X^{2} - {{mean}\left( X^{1} \right)}}}{{std}\left( X^{1} \right)}}{{{Where}\text{:}\mspace{14mu}{{mean}\left( X^{1} \right)}} = {\frac{1}{M^{1}}{\sum\limits_{i = 1}^{M^{1}}x_{ij}}}}\mspace{14mu}{and}\mspace{14mu}{{{std}\left( X^{1} \right)} = \sqrt{\frac{1}{M^{1} - 1}{\sum\limits_{i = 1}^{M^{1}}\left( {x_{ij} - {{mean}\left( X^{1} \right)}} \right)^{2}}}}} & (3)\end{matrix}$

A linear SVM is trained with all variables X and the resultingdiscriminant axis W1 is given by:W1=SVs1^(t)*Alpha1where SVs1 is a matrix (L, N) of L support vectors and Alpha1 is thevector of Lagrange multipliers of a size L. See Vapnik V., “The Natureof Statistical Learning Theory”, Springer Verlag, 1995 for furtherdetails.

The data are projected onto W1. The goal is to find the variables thatare together highly correlated with the projected data on thediscriminant axis W1.

In this embodiment, the problem of finding the best S variables issolved by a forward sequential selection. The algorithm starts with anempty set of variables and adds variables until S variables have beenselected. A fitness criterion is evaluated for the set of variables thatcombines the already selected and the current variable. The one givingthe maximum fitness is added to the set of selected variables.

Note that the forward sequential selection is just one of many selectionalgorithm that could be used such as backward or stepwise selection.See, for example, Fukunaga K., “Introduction to Statistical PatternRecognition”, Academic Press, INC, 2^(nd) ed. 1990 for further detailsregarding selection algorithms.

Let us name the discriminant axis W1, the projected data

$P = \begin{pmatrix}P^{1} \\P^{2}\end{pmatrix}$of class1 and class2, and the set of data with S={s₁, . . . , s_(r)}selected variables X_(s):P=X*W1  (4)X _(S)=(x _(ij))i=1 . . . M,j=s ₁ . . . s _(r).

First of all the projection P and the set of data X are centered bysubtracting their means:P=P−mean(P)  (5)X=X−mean(X)  (6)

Then the vector P is scaledP=P/norm(P)  (7)

Assuming that we have already selected S={s₁, . . . ,s_(r)} variablesand we are evaluating the relevance of the variable j that is not takenyet (j∉S). The fitness defines a kind of correlation between S_(j)=S∪{j}and P.F_(S) _(j) =K_(S) _(j) _(P) ^(t)K_(S) _(j) _(S) _(j) ⁻¹K_(S) _(j)_(P)  (8)

Where K_(S) _(j) _(P)=X_(S) _(j) ^(t)P and K_(S) _(j) _(S) _(j) =X_(S)_(j) ^(t)X_(S) _(j)

The variable giving the best fitness is added to set of selectedvariables. We stop selecting variables when the fitness reaches a givenvalue (0.998) or if we have selected a predefined number of variables.

The steps of the algorithm can be summarised as set out below:

-   1. Construct the whole set of normalised variables.-   2. Then scale the data according to (3).-   3. Train the SVM with the whole set of variables. SVM returns the    support vectors that will be used to construct the discriminant    vector W1.-   4. Project all the data on to W1: P=X*W1,-   5. Center the vector P according to (5);-   6. Scale the vector P using its norm: P=P/norm(P);-   7. Scale X according to (6);-   8. initialisation of selection: R={1 . . . N}, S={ };-   9. Repeat    -   a) For all remaining variables in R; evaluate the fitness        criterion F_(S) _(j) (8) of the variable j. F is the list of all        fitnesses: F=(F_(S) _(j) )jεR    -   b) Find the best variable that maximizes the fitness: best=arg        max_(j)(F)    -   c) Add the variable best to the set of selected variables        S=S∪{best};    -   d) Remove the variable best from the remaining list of variables        R={1 . . . ,best−1,best+1, . . . N};

Until the number of selected variables or the fitness reach a givenvalue.

-   10. Retrain the SVM with the selected variables and retain the    discriminant axis W2 given by W2=SVs2^(t)*Alpha2 as the final    discriminator. SVs2 and Alpha2 are the support vectors and the    Lagrange multipliers of the second run of SVM.

The above discussion relates to selecting the subset of variables to beused in the banknote tester. Parameters representing the selectedvariables and the discriminant axis are stored in the memory of thebanknote tester and used for subsequent testing (authentification,denomination etc).

More specifically, to test a banknote, the banknote is sensed by thesensor array, and the measurements for the selected variables areprocessed by the SVM. The measurements are used to build a test vector,which is projected onto the discriminant vector W2. This results in ascalar value which can be compared with one or more thresholds. Forexample, if the scalar value is less than or greater than a givethreshold it can be treated either as belonging to a reference class ornot. Similarly, for two thresholds, if the scalar value is between twothresholds it can be treated as belong to a reference class or otherwisenot.

The processing for deriving the subset of variables and discriminantaxis can be performed either in the banknote tester itself, or in aseparate device, with the parameters representing the selected variablesand the discriminant axis being subsequently stored in the memory of abanknote tester for later use.

As discussed above, this approach gives a similar result in terms ofperformance to using the original number N variables, but actually usingfewer variables, so that there is less processing and hence fasterresults.

It is also possible not to norm P and use another range for the fitnesscriteria. The higher the value of the fitness, the better is thereconstruction of the projected data. It is convenient to fix this valueto a given level smaller than 1 to avoid numerical problems, forinstance 0.998 and stop the selection when this value is reached.

Practical tests in a bill validator for various denomination have shownthat about around 20 to 40 variables are sufficient for keeping a highperformance when some other variants require more, for example like 64variables.

FIG. 4 shows the SVM results for the denomination of a 2 specificdenominations with all variables, in this case 357 (top) and 16variables (bottom).

It can be seen that in this case, all the objects of the 2^(nd) classare on one side of the reference class and the classes can be separatedby a single threshold. Alternatively the reference class can be enclosedbetween 2 values defining an acceptance window to cover the possibilitythat a foreign object could be classified on the other side of thereference class.

In the embodiment, the subset of variables is selected using a SVMclassifier, and then the selected subset of variables is used forclassification using an other SVM classifier, However, once the relevantvariables, which are representative of the original variables andcorresponding input data, have been selected, any suitable type ofclassification can be carried out using the relevant variables, such asLDA (linear discriminant analysis) or Mahalanobis distance, or similar,as known to the person skilled in the art. The representative subset ischosen to reduce the number of variables whilst maintaining adequateperformance in terms of classification, so that the selected variablesare representative of the original variables and the corresponding data.This can be evaluated in various ways, such as using a fitness functionas described above.

References to banknotes include other similar types of value sheets suchas coupons, cheques, and includes genuine and fake examples of suchdocuments. A system may involve the use of means, such asedge-detectors, for detecting the orientation, such as skew and offsetof a banknote relative to, eg, the transport direction and/or the sensorarray or a fixed point(s). Alternatively, a system may include means forpositioning a banknote in a desired orientation, such as with the lengthof the bill along the transport path with edges parallel to thetransport direction, or at a desired angle relative to the transportdirection and/or sensor array.

The described embodiments are banknote testers. However, the inventionmay also be applied to other types of currency testers, such as cointesters. For example, signals from a coin tester taking measurements ofcoin characteristics, such as material, at a succession of points acrossa coin may be interpolated to produce a signal representative of thecharacteristic across the coin.

The term “coin” is employed to mean any coin (whether valid orcounterfeit), token, slug, washer, or other metallic object or item, andespecially any metallic object or item which could be utilised by anindividual in an attempt to operate a coin-operated device or system. A“valid coin” is considered to be an authentic coin, token, or the like,and especially an authentic coin of a monetary system or systems inwhich or with which a coin-operated device or system is intended tooperate and of a denomination which such coin-operated device or systemis intended selectively to receive and to treat as an item of value.

The invention claimed is:
 1. A method of deriving a classifier forclassifying currency bills using a plurality of variables forcharacteristics of the bills, the method comprising: determining arepresentative subset of the plurality of variables for use in saidclassifier by using a first classifier and the plurality of variables toselect the representative subset of the plurality of variables based ona fitness criteria describing a correlation of at least some of theplurality of variables with input data projected on a discriminant axisof the first classifier; wherein the bills are classified between areference class which is one billway of a denomination and a secondclass containing at least 3 other billways of the same denomination. 2.The method of claim 1 wherein the first classifier is a support vectormachine (SVM) classifier.
 3. The method of claim 1 wherein the fitnesscriteria, F_(S) _(j) , is defined according to the equation F_(S) _(j)=K_(S) _(j) _(P) ^(t)K_(S) _(j) _(S) _(j) ⁻¹K_(S) _(j) _(P).
 4. A methodof deriving a classifier for classifying items using a first classifierand a plurality of variables for characteristics of the items, themethod comprising: determining a first subset of the plurality ofvariables for use in said classifier; selecting a second subset of theplurality of variables, by incrementally increasing or decreasing saidfirst subset of variables with a chosen variable that increases ordecreases a fitness criteria, the fitness criteria describing acorrelation of at least some of the plurality of variables with inputdata projected on a discriminant axis of the first classifier.
 5. Themethod of claim 4 comprising evaluating each subset using a fitnesscriteria.
 6. The method of claim 4 comprising increasing or decreasingsaid first subset of the plurality of variables until a predeterminednumber of variables and/or a predetermined fitness criterion level isreached.
 7. The method of claim 1 comprising scaling input data.
 8. Themethod of claim 7 wherein scaling is based on a subset of input data. 9.The method of claim 8 wherein the scaling is based on input data for thereference class.
 10. The method of claim 9 wherein the scaling of inputdata is based on a mean and a standard deviation of the input data. 11.The method of claim 9 wherein scaling the input data involvessubtracting a mean of the input data from the input data and dividing bya standard deviation of the input data.
 12. The method of claim 7comprising taking an absolute value of the scaled data.
 13. A method ofderiving a classifier for classifying items using a plurality ofvariables for characteristics of the items, the method comprising:determining a representative subset of the variables for use in saidclassifier; constructing a whole set of normalized variables, scalinginput data according to an equation${X = \frac{{X - {{mean}(x)}}}{{std}(X)}},$ training support vectormachine (SVM)classifier with substantially the whole set of variables,constructing a discriminant vector W1, projecting substantially all theinput data onto W1 such that P=X*W1, centering the vector P according toan equation p=p−mean(p), scaling the vector P using its norm such that P=P/norm(P), scaling x according to an equation x=x−mean(x), initializinga selection R={1. . . N},S={ }, repeating a sequence until a number ofselected variables or a fitness criterion reach a given value, thesequence comprising: for all remaining variables in R, evaluate thefitness criterion F_(S) _(j) =K_(S) _(j) _(P) ^(t)K_(S) _(j) _(S) _(j)⁻¹K_(S) _(j) _(P) of the variable j, wherein F is the list of allfitnesses, and is defined according to an equation F=(F_(s) _(j) )jεR,find a best variable that maximizes the fitness according to an equationbest=arg max_(j)(F), add the variable best to the set S of selectedvariables according to an equation S=S ∪{best} remove the variable bestfrom the list of remaining variables R={1. . . best−1.best+1. . . N};upon completion of the sequence, retraining the SVM with the selectedvariables and retaining a discriminant axis W2=SVs2^(t) * Alpha2 as afinal discriminator, wherein SVs2 and Alpha2 are support vectors andLagrange multipliers of a second run of SVM.
 14. The method of claim 1wherein the second class further comprises a plurality of otherdenominations.
 15. The method of claim 4 for classifying coins between areference class which is a coin of a first denomination and a secondclass which is at least one coin of another denomination.
 16. A methodof testing an item of currency using the classifier derived using themethod of claim
 4. 17. The method of claim 4 wherein the items aredifferent from the data set used to derive the classifier.
 18. Themethod of claim 17 comprising measuring a currency item, extracting datafor the selected variables, and using the data for the selectedvariables and the classifier to derive a scalar value.
 19. The method ofclaim 18 comprising projecting the data for the selected variables ontoa derived classifier discriminant axis to derive the scalar value. 20.The method of claim 18 comprising comparing the scalar value with athreshold.
 21. The method of claim 20 comprising accepting or rejectinga currency item depending on whether the scalar is above or below thethreshold.
 22. The method of claim 18 comprising comparing the scalarvalue with two thresholds.
 23. The method of claim 22 comprisingaccepting a currency item if the scalar is between the two thresholdsand rejecting it otherwise.
 24. A method of manufacturing a currencyvalidator comprising storing a representation of the classifier derivedusing the method of claim 1 in the validator.
 25. The method of claim 24comprising storing a representation of a classification function and thesubset of variables.
 26. The method of claim 24 comprising storing oneor more of factors representing support vectors and the discriminantaxis and the subset of variables.
 27. A currency validator comprising:at least one sensor; at least one data processor; and memory storing arepresentation of a support vector machine (SVM classifier derived usinga method of deriving a classifier for classifying currency bills using aplurality of variables for characteristics of the bills, the methodcomprising: determining a representative subset of the plurality ofvariables for use in said classifier by using a first classifier and theplurality of variables to select the representative subset of theplurality of variables based on a fitness criteria describing acorrelation of at least some of the plurality of variables with inputdata projected on a discriminant axis of the first classifier; whereinthe bills are classified between a reference class which is one billwayof a denomination and a second class containing at least 3 otherbillways of the same denomination.
 28. An apparatus comprising: at leastone data processor; and memory storing instructions which, when executedby the at least one data processor, causes the at least one dataprocessor to perform operations for deriving a classifier forclassifying currency bills using a plurality of variables forcharacteristics of the bills, the operations comprising: determining arepresentative subset of the plurality of variables for use in saidclassifier by using a first classifier and the plurality of variables toselect the representative subset of the plurality of variables based ona fitness criteria describing a correlation of at least some of theplurality of variables with input data projected on a discriminant axisof the first classifier; wherein the bills are classified between areference class which is one billway of a denomination and a secondclass containing at least 3 other billways of the same denomination. 29.The method of claim 4, wherein said chosen variable is not in the firstsubset of the plurality of variables when the first subset of theplurality of variables is increased and said chosen variable is in thefirst subset of the plurality of variables when the first subset of theplurality of variables is decreased.